ENH: cov: expose correction and weights parameters by bruAristimunha · Pull Request #690 · data-apis/array-api-extra

bruAristimunha · 2026-04-17T10:43:02Z

Resolves #688.

Summary

Adds axis, correction, frequency_weights, and weights parameters to xpx.cov, unlocking the degrees-of-freedom and weighted variants that numpy.cov and torch.cov already support.
Naming follows array-api conventions (axis, correction) used elsewhere in this library rather than numpy's (rowvar, bias, ddof). The docstring includes a one-to-one mapping for users migrating from numpy.cov.

Design

The delegation moves observations to the last axis via xp.moveaxis, which collapses rowvar out of backend dispatch entirely — only ddof (numpy/cupy/dask/jax) vs correction (torch) differs between branches.

Fallbacks to the generic implementation (_funcs.cov):

m.ndim > 2 (batched input, not supported by any native).
Non-integer correction (rejected by numpy.cov's ddof).
Dask with weights — dask.array.cov forces .compute() on a lazy 0-D scalar via its internal if fact <= 0 check. The generic path stays fully lazy because its weighted branch doesn't compare fact to zero (noted in docstring).

Weighted formula in _funcs.cov matches numpy's (algebraically): c = (m_c · w) @ m_c.T / (v1 - correction · v2 / v1).

Tests

New TestCov cases validate against np.cov as reference:

test_correction (integer ddof)
test_correction_float (generic-path-only, hand-computed reference)
test_axis / test_axis_with_weights / test_axis_out_of_bounds
test_frequency_weights / test_weights / test_both_weights
test_batch_with_weights

Test plan

pytest tests/test_funcs.py::TestCov — 126 passed across numpy, torch, jax, dask, array-api-strict
pytest tests/test_funcs.py full — 4263 passed, 0 failed
lefthook run pre-commit — ruff, numpydoc, mypy, pyright, typos all green
Dask laziness verified — lazy_xp_function(cov) asserts 0 .compute() calls, holds for weighted path via the fallback

Resolves data-apis#688. Adds `axis`, `correction`, `frequency_weights`, and `weights` to `cov`, giving users control over the degrees-of-freedom correction and the observation-axis / weighted variants that `numpy.cov` and `torch.cov` already support. Naming follows array-api conventions (`axis`, `correction`) rather than numpy's (`rowvar`, `bias`, `ddof`); the docstring includes a one-to-one mapping. The delegation moves observations to the last axis via `xp.moveaxis`, collapsing `rowvar` out of the backend dispatch — only `ddof` vs `correction` differs between branches. Dask's native `cov` forces `.compute()` on a lazy scalar when any weights are given, so weighted dask inputs fall through to the generic implementation, which is fully lazy.

betatim · 2026-04-20T09:16:47Z

It looks like the cov you are adding follows the pytorch signature, can you explain a bit why you chose that? In my PR I thought following the Numpy API makes sense because it seems that most libraries use that.

The PR description mentions that other functions in this library already use correction and axis. Which is a good reason to also do it here? Interested in your thinking.

bruAristimunha · 2026-04-20T09:45:06Z

Hey @betatim!

This was a little hard decision that I had to make, but I can be more strict with numpy if you prefer.

I basically looked at what was already implemented on the API array and how they handle the parameter names that I was trying to implement.

Like, for each parameter that I was trying to introduce, I checked how it was made in the past here from numpy to: the bias, the rowvar, the ddof, the fweights, and the aweights.

Basically, for the bias, ddof to become correction, I notice that in the functions xp.var, xp.std, and think xp.sum, they change the default names to the array api specification name.

https://data-apis.org/array-api/latest/API_specification/generated/array_api.var.html
https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html

There was a discussion on how to use correction instead of bias+ddof on these functions. Here was introduced data-apis/array-api#10, and then, later, they made some interesting discussions here: data-apis/array-api#695; it was @kgryte who led the discussion.

For the case of the rowvar becoming the axis, I just follow the signature of the other functions. seems like the axis was how they followed.

And for the frequency_weights and weights, it was my experience in Pyriemann that made the decisions. I think the only place that I remember using something similar was the statsmodels (freq_weights, var_weights) that uses https://www.statsmodels.org/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html#statsmodels.genmod.generalized_linear_model.GLM.freq_weights

I think in scikit you guys use sample_weight more, but I can accommodate any request about this.

betatim

What is your thinking on validating the weights passed in? Things like checking the shapes make sense, that they are all positive (is this actually required? how does it fit with being lazy?)

bruAristimunha · 2026-04-20T09:55:43Z

I liked this idea a lot @betatim! I think it will make the check in the library that use api array extra much lighter.

bruAristimunha · 2026-04-20T10:12:36Z

FYI @qbarthelemy and @agramfort

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

betatim · 2026-04-20T11:59:18Z

Thanks a lot for the detailed answer in #690 (comment) - I didn't realise there was precedent for using correction in functions like var. I think it makes sense to copy that and use correction for cov as well. Worth making the translation!

What is the "temporary deployed" thing that keeps happening?

bruAristimunha · 2026-04-20T12:01:01Z

it is not me @betatim, i think it something that @lucascolley is pushing in pushing here: #699

bruAristimunha · 2026-04-20T12:01:58Z

Happy that you liked the response @betatim :)

I think I addressed all the points from you and @qbarthelemy, can we merge?

lucascolley · 2026-04-20T12:07:24Z

What is the "temporary deployed" thing that keeps happening?

fixed in bd3652a

lucascolley

I took an initial look, seems pretty good!

One high level comment @bruAristimunha — could you demonstrate that this works as expected when used in a branch of sklearn? You should be able to change https://github.com/scikit-learn/scikit-learn/blob/06aded051fe6c7c9970b7e13c3669f952a799831/maint_tools/vendor_array_api_extra.sh#L8-L9 to point to this branch and commit hash.

bruAristimunha · 2026-04-20T14:11:36Z

hey @betatim,

As you have the first covariance PR on scikit, can you help with this small test as requested by @lucascolley?

One high level comment @bruAristimunha — could you demonstrate that this works as expected when used in a branch of sklearn? You should be able to change https://github.com/scikit-learn/scikit-learn/blob/06aded051fe6c7c9970b7e13c3669f952a799831/maint_tools/vendor_array_api_extra.sh#L8-L9 to point to this branch and commit hash.

bruAristimunha · 2026-04-20T16:27:29Z

hey @lucascolley,

I made in my branch that was built on top of @betatim's work for scikit-learn first covariance, you can check more here: scikit-learn/scikit-learn#33600

lucascolley · 2026-04-20T17:34:58Z

hey @lucascolley,

I made in my branch that was built on top of @betatim's work for scikit-learn first covariance, you can check more here: scikit-learn/scikit-learn#33600

thanks! Would be great if you could take a look, Tim

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

Addresses review feedback (kgryte, betatim) that the motivation for allowing non-integer correction was not obvious from the docstring: weighted unbiased correction and autocorrelated data both require fractional values.

Adds tests for the 1-D shape and length checks in the generic cov path. Raises the diff coverage for this PR from 93.33% to 100%.

bruAristimunha · 2026-05-05T12:29:12Z

hey @lucascolley,

I was wondering, can you please approve the CI for the final test?

bruAristimunha mentioned this pull request Apr 17, 2026

Add bias keyword argument to cov #691

Closed

betatim reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_delegation.py

betatim reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_delegation.py

betatim reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_lib/_funcs.py Outdated

betatim reviewed Apr 20, 2026

View reviewed changes

MNT: drop device= in cov weights

d9701e0

bruAristimunha force-pushed the cov_parameters branch from 83b7e1b to d9701e0 Compare April 20, 2026 10:10

STY: formatter

72e2d61

TST: add bias tests from data-apis#691

c0a20b0

bruAristimunha temporarily deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Inactive

bruAristimunha deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Active

bruAristimunha temporarily deployed to ci-checks April 20, 2026 10:26 — with GitHub Actions Inactive

lucascolley added the enhancement New feature or request label Apr 20, 2026

qbarthelemy reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_lib/_funcs.py Outdated

Update _funcs.py

c621a74

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

qbarthelemy reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_delegation.py Outdated

bruAristimunha and others added 2 commits April 20, 2026 13:39

Update _delegation.py

98f216a

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

MNT: rename weights params to fweights/aweights

06b4007

ENH: validate weights shape in cov

8b5c471

lucascolley changed the title ~~ENH: expose correction and weights parameters in cov~~ ENH: cov: expose correction and weights parameters Apr 20, 2026

lucascolley reviewed Apr 20, 2026

View reviewed changes

bruAristimunha added 2 commits April 20, 2026 15:35

MNT: address lucascolley review

cb717b0

MNT: move weights validation to generic cov

e34d415

qbarthelemy reviewed Apr 20, 2026

View reviewed changes

Comment thread src/array_api_extra/_lib/_funcs.py Outdated

bruAristimunha and others added 3 commits April 20, 2026 22:41

Update _funcs.py

f560372

Co-authored-by: Quentin Barthélemy <q.barthelemy@gmail.com>

DOC: explain non-integer correction use cases in cov

e5f5d6a

Addresses review feedback (kgryte, betatim) that the motivation for allowing non-integer correction was not obvious from the docstring: weighted unbiased correction and autocorrelated data both require fractional values.

TST: cover weight validation error paths in cov

2de1394

Adds tests for the 1-D shape and length checks in the generic cov path. Raises the diff coverage for this PR from 93.33% to 100%.

lucascolley added this to the 0.10.2 milestone Apr 24, 2026

Conversation

bruAristimunha commented Apr 17, 2026

Summary

Design

Tests

Test plan

Uh oh!

betatim commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

betatim commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

lucascolley commented Apr 20, 2026

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

bruAristimunha commented Apr 20, 2026

Uh oh!

lucascolley commented Apr 20, 2026

Uh oh!

Uh oh!

bruAristimunha commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants